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Bryan Jurish 

Tools, Toys, and Filters 

A Tinker’s Apology 


»Use filters« - Brian Eno & Peter Schmidt, 
Oblique Strategies, 1975 


As a tinker of algorithms, a tweaker of data 
structures, and a dyed-in-the-wool Platonist, I am 
committed to the (objective) existence of mathe- 
matical entities such as numbers and the relations 
between them. I nonetheless follow Christiane Birr 
in her skepticism regarding Anderson’s (2008) 
blithe assertion that »given enough data, the num¬ 
bers speak for themselves«. Numbers seldom li eper 
se, but nor are they renowned for their loquacity. 
The traditional distinction between deductive 
truths about formal objects and inductive inter- 
pretations of empirical data acquired from a data 
sample such as a text corpus is useful here. Com¬ 
puters are well-suited to deductive tasks involving 
counting and other numerical manipulations, and 
they are quite reliable for such purposes. They are 
not very adept at deciding, for example, what to 
count (corpus selection) or drawing (creative, in- 
terpretative) conclusions based on (numerical) da¬ 
ta. As Silke Schwandt suggested: »I cannot find 
what I didn’t look for«, and »the interpretative act 
is my own«. I submit that computational tools for 
humanities research (»DH«, »Digital Humanities«) 
are best understood as filters in the sense of 
Shannon’s (1948) model of communication, also 
cited in the current context by Manfred Thaller. 

In terms of Shannon’s model, we should first 
acknowledge that natural language itself is a »los- 
sy« or »noisy« encoding/decoding scheme (»co- 
dec«): ambiguity, under-specification, and other 
opportunities for misinterpretation abound in lin- 
guistic communication (Reddy, 1979). DH tools 
acting on text data typically compress the (already 
error-laden) signal further by applying a tool-spe- 
cific data model (e. g. word counts), performing 
formal manipulations on that representation, and 
formatting the results for human inspection. In 
terms of Shannon’s model, this is simply an 
additional encoding applied to the (already text- 
encoded) original message or, in other words, a 
filter. A »lossy« filter degrades messages passed 
through it: most exploratory DH tools fall into 


this category, since implicit in their design is a 
desire for high compression rates on the one hand 
(we already have the text-encoding), and on the 
other because a precise characterization of the 
formal models required for a 1:1 reproduction of 
the original (semantic, communicative-intention- 
al, transmitter-internal) message has thus far 
eluded us (and possibly always will). 

Lossy filters should not disturb us, however. As 
humans, we come equipped with (are predisposed 
to) a whole bevy of integrated filters: linguistic 
filters for parsing (minimal attachment) and Inter¬ 
pretation (semantic priming), perceptual ones for 
motion detection and voice recognition, cognitive 
filters for object independence and causal relations, 
as well as cultural ones for shared experience and 
common knowledge. Adding another (lossy) filter 
to our data intake process increases the informa- 
tional »distance« in Moretti’s sense, but does not 
change the fact that the communication channel 
between the transmitter (text, author, object) and 
the receiver (ourselves, subjects, minds) is already 
noisy (i. e. fallible). The »intuitivity« often predi- 
cated of DH tools is nothing more or less than an 
exploitation of the human users’ pre-existing per¬ 
ceptual / cognitive / cultural filters by use of color, 
motion, size, or shared metaphors such as tag- 
clouds, time series or histogram plots, etc. Such 
exploitation can be considered successful to the 
extern that all and only the relevant data is passed 
through both the programmatic and user-inte- 
grated filters, as when the most »interesting« fea- 
ture of the data is the most visually striking 
element of the presentation format. 

The validity of interpretative conclusions drawn 
from empirical input is a well-known epistemo- 
logical problem: that of induction. The question 
for DH is whether or not we are willing to accept 
yet another layer of filters on the data we consume. 
There are good reasons to do so. Perceptual filters 
tend to act as a »fast lane« for salient environmental 
data: visual sensitivity to motion for example can 
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alert us to the potential presence of a predator. We 
retain the Option of subsequently redirecting our 
conscious attention to the detected phenomenon 
for detailed inspection and Interpretation (»was the 
motion caused by a hungry tiger or a frightened 
rabbit?«). Exploratory DH tools can act similarly as 
a »fast lane« for salient cultural data, constructed to 
facilitate subsequent refocusing on a detailed in¬ 
spection (close reading) of »interesting« phenom- 
ena, where »interest« is a function of the user’s 
individual research program. DH tools need not 
replace traditional close readings, but can instead 
act as »coarse caricatures« or »executive summa¬ 
ries« indicating which (textual) phenomena might 
warrant more careful study. As tinkers, our task is 
to minimize the apprehended lossiness of the Alters 
by optimizing our data models and manipulations 
for the users’ common research goals, analogous to 
the optimization of populär audio codecs (e. g. 
mp3, ogg) for the human auditory perceptual 
apparatus. This can be a frustrating task, since the 
research goals of humanities scholars can vary 
widely, and commonalities can be difficult to 
identify and formally characterize. As noted by 
various colleagues, communication and compro- 
mise between humanities scholars and tool build- 
ers working together is the most promising path 
for improvement in this regard. 

Implicit above is the assumption that use of 
computational tools does not itself affect human¬ 
ities scholars’ underlying research goals. I propose 
that DH methods can however alter the tempo and 
spirit of (certain aspects of) the humanities re¬ 
search process: speedy responses and intuitive (ex- 
ploitative) interfaces can allow a »playful« interac- 
tion with the underlying data and rapid (»agile«) 


adaptation of (potential, proto-) research questions 
in response to the (real, »objective«) formal proper- 
ties of the sample as encoded by the method in 
question. Here again, the key element is the cohe- 
sion of the tool codec (i. e. the data model and 
presentation format), the user’s research interests, 
and his or her pre-existing perceptual/cognitive 
Alters. Playful interaction implies that I as a user 
am open to distraction and continuous Creative re- 
invention of the activity at hand, which means that 
I must have sufAcient cognitive resources available 
for re-allocation. If I can rely on my integrated 
perceptual / cognitive apparatus to inform me of 
»interesting« phenomena - if the programmatic, 
scholarly, and integrated perceptual / cognitive Al¬ 
ters cohere - then I likely have such resources 
available. Otherwise, distractions tend to be simply 
»irritating«. 

As a Anal observation, the issue of cohesion is 
also of central importance to my own work as a 
builder of computational tools. These must be 
evaluated on at least two independent scales. In- 
trinsic properties such as correctness and complex- 
ity can be formally evaluated and discussed in the 
methodological domain (Computer Science, com¬ 
putational linguistics, etc.). As tools, they must also 
be evaluated in terms of extrinsic properties such as 
Aexibility and utility, which are only predicable 
relative to one or more given user-dependent tasks. 
I as a tinker therefore humbly ask for the help, 
patience, and Cooperation of curious humanities 
scholars, that together we might develop less irri¬ 
tating, less restrictive, more interesting, and more 
coherent tools (and toys). 
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